Phonotactic Model for Spoken Language Identification in Indian Language Perspective
نویسندگان
چکیده
Indian Languages are Indo-Aryan being influenced by Sanskrit or Dravidian being influenced by Tamil. Dravidian Languages have the influence of Sanskrit also. All Indian Languages have the influence of Pali language for which the graphemes are being influenced Brahmi. All the Indian languages are phonetic in nature. Every Indian language has its distinctive phone sets. North Indian languages are IndoAryan and South Indian Languages are Dravidian. Considering their respective Phonetic properties during speaking we have tried to consider the special CV behaviour of the language in their syllables and are able to identify the Language analysing it with the limited training data set available using the SVM Classifier. During this process we have analysed the PPR Language Modelling concept for four major Indian languages like Hindi, Bengali, Oriya, and Telugu and the results are quite appreciable. General Terms Spoken Language Identification, Speech Processing, Support Vector Machine
منابع مشابه
مقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملA Phonotactic Language Model for Spoken Language Identification
We have established a phonotactic language model as the solution to spoken language identification (LID). In this framework, we define a single set of acoustic tokens to represent the acoustic activities in the world’s spoken languages. A voice tokenizer converts a spoken document into a text-like document of acoustic tokens. Thus a spoken document can be represented by a count vector of acoust...
متن کاملTowards High Performance Phonotactic Feature for Spoken Language Recognition
With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...
متن کاملThe L2F Language Verification System for NIST LRE 2009
This paper presents a description of the INESC-ID’s Spoken Language Systems Laboratory (LF) Language Verification system submitted to the 2009 NIST Language Recognition evaluation. The LF system is composed by the fusion of eight individual sub-systems: four phonotactic systems and four acoustic based methods. Language recognition results have been submitted for the “closed-set”, “open-set” and...
متن کاملPhonotactic spoken language identification with limited training data
We investigate the addition of a new language, for which limited resources are available, to a phonotactic language identification system. Two classes of approaches are studied: in the first class, only existing phonetic recognizers are employed, whereas an additional phonetic recognizer in the new language is created for the second class. It is found that the number of acoustic recognizers emp...
متن کامل